Plagiarism detection using stopword n-grams

نویسنده

  • Efstathios Stamatatos
چکیده

In this paper, a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly-available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a Variety of n-Grams for the Detection of Different Kinds of Plagiarism Notebook for PAN at CLEF 2013

A text can be plagiarised in different ways. The text may be copied and pasted word by word, parts of the text may be changed, or the whole text may be summarised into one or two lines. Different kinds of plagiarism require different strategies to detect them. But rarely do we know beforehand what type of plagiarism we are dealing with. In this paper we present a system that can detect verbatim...

متن کامل

Plagiarism Detection in Programming Assignments Using Deep Features

This paper proposes a method for detecting plagiarism in source-codes using deep features. The embeddings for programs are obtained using a character-level Recurrent Neural Network (char-RNN), which is pre-trained on Linux Kernel source-code. Many popular plagiarism detection tools are based on n-gram techniques at syntactic level. However, these approaches to plagiarism detection fail to captu...

متن کامل

Plagiarism Detection in Obfuscated Documents Using an N-gram Technique

Plagiarism is considered as a major problem these days especially in case of academic institutions. Very often students present someone else’s work as their own and they are given credit for it. Therefore, we have to integrate plagiarism checking into the submission process which includes usage of plagiarism detection software. In this paper we focus on vulnerabilities of this software from the...

متن کامل

Automatic Plagiarism Detection Using Word-Sentence Based S-gram

Plagiarism is an academic problem that is caught more and more each year. Common tricks that the cheaters normally use is inserting and removing a few extra terms, sentences, or paragraph to the original copy to trick the reader that the plagiarist copy and the original copy are unalike. This paper provides a new way to detect the plagiarism by checking the similarity between sentences, and par...

متن کامل

ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection

In this paper we describe a new general plagiarism detection method, that we used in our winning entry to the 1st International Competition on Plagiarism Detection, the external plagiarism detection task, which assumes the source documents are available. In the first phase of our method, a matrix of kernel values is computed, which gives a similarity value based on n-grams between each source a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 62  شماره 

صفحات  -

تاریخ انتشار 2011